21 research outputs found

    Finding Temporal Patterns in Noisy Longitudinal Data: A Study in Diabetic Retinopathy

    Get PDF
    This paper describes an approach to temporal pattern mining using the concept of user defined temporal prototypes to define the nature of the trends of interests. The temporal patterns are defined in terms of sequences of support values associated with identified frequent patterns. The prototypes are defined mathematically so that they can be mapped onto the temporal patterns. The focus for the advocated temporal pattern mining process is a large longitudinal patient database collected as part of a diabetic retinopathy screening programme, The data set is, in itself, also of interest as it is very noisy (in common with other similar medical datasets) and does not feature a clear association between specific time stamps and subsets of the data. The diabetic retinopathy application, the data warehousing and cleaning process, and the frequent pattern mining procedure (together with the application of the prototype concept) are all described in the paper. An evaluation of the frequent pattern mining process is also presented

    Information extraction from free text for aiding transdiagnostic psychiatry: constructing NLP pipelines tailored to clinicians’ needs

    Get PDF
    Background: Developing predictive models for precision psychiatry is challenging because of unavailability of the necessary data: extracting useful information from existing electronic health record (EHR) data is not straightforward, and available clinical trial datasets are often not representative for heterogeneous patient groups. The aim of this study was constructing a natural language processing (NLP) pipeline that extracts variables for building predictive models from EHRs. We specifically tailor the pipeline for extracting information on outcomes of psychiatry treatment trajectories, applicable throughout the entire spectrum of mental health disorders (“transdiagnostic”). Methods: A qualitative study into beliefs of clinical staff on measuring treatment outcomes was conducted to construct a candidate list of variables to extract from the EHR. To investigate if the proposed variables are suitable for measuring treatment effects, resulting themes were compared to transdiagnostic outcome measures currently used in psychiatry research and compared to the HDRS (as a gold standard) through systematic review, resulting in an ideal set of variables. To extract these from EHR data, a semi-rule based NLP pipeline was constructed and tailored to the candidate variables using Prodigy. Classification accuracy and F1-scores were calculated and pipeline output was compared to HDRS scores using clinical notes from patients admitted in 2019 and 2020. Results: Analysis of 34 questionnaires answered by clinical staff resulted in four themes defining treatment outcomes: symptom reduction, general well-being, social functioning and personalization. Systematic review revealed 242 different transdiagnostic outcome measures, with the 36-item Short-Form Survey for quality of life (SF36) being used most consistently, showing substantial overlap with the themes from the qualitative study. Comparing SF36 to HDRS scores in 26 studies revealed moderate to good correlations (0.62—0.79) and good positive predictive values (0.75—0.88). The NLP pipeline developed with notes from 22,170 patients reached an accuracy of 95 to 99 percent (F1 scores: 0.38 – 0.86) on detecting these themes, evaluated on data from 361 patients. Conclusions: The NLP pipeline developed in this study extracts outcome measures from the EHR that cater specifically to the needs of clinical staff and align with outcome measures used to detect treatment effects in clinical trials

    Differences in Trial and Real-world Populations in the Dutch Castration-resistant Prostate Cancer Registry

    Get PDF
    __Background:__ Trials in castration-resistant prostate cancer (CRPC) treatment have shown improved outcomes, including survival. However, as trial populations are selected, results may not be representative for the real-world population. The aim of this study was to assess the differences between patients treated in a clinical trial versus standard care during the course of CRPC in a real-world CRPC population. __Design, setting, and participants:__ Castration-resistant Prostate Cancer Registry is a population-based, observational, retrospective registry. CRPC patients from 20 hospitals in the Netherlands have been included from 2010 to 2013. __Outcome measurements and statistical analysis:__ Baseline characteristics, systemic treatment, and overall survival were the main outcomes. Descriptive statistics, multivariate Cox regression, and multiple imputations with the Monte Carlo Markov Chain method were used. __Results and limitations:__ In total, 1524 patients were enrolled of which 203 patients had participated in trials at any time. The median follow-up period was 23 mo. Patients in the trial group were significantly younger and had less comorbidities. Docetaxel treatment was more freque

    Health-related Quality of Life and Pain in a Real-world Castration-resistant Prostate Cancer Population: Results From the PRO-CAPRI Study in the Netherlands

    Get PDF
    Background: The purpose of this study was to determine generic, cancer-specific, and prostate cancer-specific health-related quality of life (HRQoL), pain and changes over time in patients with metastatic castration-resistant prostate cancer (mCRPC) in daily practice. Patients and Methods: PRO-CAPRI is an observational, prospective study in 10 hospitals in the Netherlands. Patients with mCRPC completed the EQ-5D, European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire (EORTC QLQ-C30), and Brief Pain Inventory-Short Form (BPI-SF) every 3 months and European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Prostate Cancer Module (EORTC QLQ-PR25) every 6 months for a maximum of 2 years. Subgroups were identified based on chemotherapy pretreatment. Outcomes were generic, cancer-specific, and prostate cancer-specific HRQoL and self-reported pain. Descriptive statistics were performed including changes over time and minimal important differences (MID) between subgroups. Results: In total, 151 included patients answered 873 questionnaires. The median follow-up from the start of the study was 19.5 months, and 84% were treated with at least 1 life-prolonging agent. Overall, patients were in good clinical condition (Eatern Cooperative Oncology Group performance status 0-1 in 78%) with normal baseline hemoglobin, lactate dehydrogenase, and alkaline phosphatase. At inclusion, generic HRQoL was high with a mean EQ visual analog score of 73.2 out of 100. The lowest scores were reported on role and physical functioning (mean scores of 69 and 76 of 100, respectively), and fatigue, pain, and insomnia were the most impaired domai

    Rulebase checking using a spatial representation

    No full text

    Spatial reasoning: improving computational efficiency

    No full text
    When spatial data is analysed the result is often very computer intensive: even by the standards of contemporary technologies, the machine power needed is great and the processing times significant. This is particularly so in 3-D and 4-D scenarios. What we describe here is a technique, which tackles this and associated problems. The technique is founded in the idea of quad-tesseral addressing, a technique, which was originally applied to the analysis of atomic structures. It is based on ideas concerning Hierarchical clustering developed in the 1960s and 1970s to improve data access time [G.M. Morton, A computer oriented geodetic database and a new technique on file sequencing, IBM Canada, 1996.], and on atomic isohedral (same shape) tiling strategies developed in the 1970s and 1980s concerned with group theory [B. Grunbaum, G.C. Shephard, Tilings and Patterns, Freeman, New York, 1987.]. The technique was first suggested as a suitable representation for GIS in the early 1980s when the two strands were brought together and a tesseral arithmetic applied [F.C. Holdroyd, The Geometry of Tiling Hierarchies, Ars Combanitoria 16B (1983) 211-244., S.B.M. Bell, B.M. Diaz, F.C. Holroyd, M.J.J. Jackson, Spatially referenced methods of processing raster and vector data, Image and Vision Computing 1 (4) (1983) 211-220., Diaz, S.B.M. Bell, Spatial Data Processing Using Tesseral Methods, Natural Environment Research Council, Swindon, 1986.]. Here, we describe how that technique can equally be applied to the analysis of environmental interaction with built forms. The way in which the technique deals with the problems described is first to linearise the three-dimensional (3-D) space being investigated. Then, the reasoning applied to that space is applied within the same environment as the definition of the problem data. We show, with an illustrative example, how the technique can be applied. The problem then remains of how to visualise the results of the analysis so undertaken. We show how this has been accomplished so that the 3-D space and the results are represented in a way which facilitates rapid interpretation of the analysis, which has been carried out.

    A Case Base Representation Technique to Support Case Based Reasoning

    No full text
    A representation technique whereby Case Bases (CBs) used in Case Based Reasoning (CBR) applications can be encoded is described. The representation considers the CB in terms of a multi-dimensional space where each dimension represents a particular attribute, and the values for that dimension an enumeration for that attribute. The advantages offered are firstly that of compression --- the CB is reduced to a set of integers. Secondly, unlike other CB compression techniques, the representation supports interaction without necessitating "de-compression"; further the numeric nature of the representation ensures that this interaction is computationally effective. Thirdly, and significantly in the context of CBR, the representation supports mechanisms whereby CBR queries can be relaxed/expanded to support ideas concerning "similarity" retrieval. Fourthly, within certain machine dependent limitations, the technique operates regardless of the number of "dimensions" (attributes) under considerat..

    Frequent Pattern Trend Analysis in Social Networks

    No full text
    Abstract. This paper describes an approach to identifying and comparing frequent pattern trends in social networks. A frequent pattern trend is defined as a sequence of time-stamped occurrence (support) values for specific frequent patterns that exist in the data. The trends are generated according to epochs. Therefore, trend changes across a sequence epochs can be identified. In many cases, a great many trends are identified and difficult to interpret the result. With a combination of constraints, placed on the frequent patterns, and clustering and cluster analysis techniques, it is argued that analysis of the result is enhanced. Clustering technique uses a Self Organising Map approach to produce a sequence of maps, one per epoch. These maps can then be compared and the movement of trends identified. This Frequent Pattern Trend Mining framework has been evaluated using two non-standard types of social networks, the cattle movement network and the insurance quote network
    corecore